Bowman County
PromptSuite: A Task-Agnostic Framework for Multi-Prompt Generation
Habba, Eliya, Dahan, Noam, Lior, Gili, Stanovsky, Gabriel
Evaluating LLMs with a single prompt has proven unreliable, with small changes leading to significant performance differences. However, generating the prompt variations needed for a more robust multi-prompt evaluation is challenging, limiting its adoption in practice. To address this, we introduce PromptSuite, a framework that enables the automatic generation of various prompts. PromptSuite is flexible - working out of the box on a wide range of tasks and benchmarks. It follows a modular prompt design, allowing controlled perturbations to each component, and is extensible, supporting the addition of new components and perturbation types. Through a series of case studies, we show that PromptSuite provides meaningful variations to support strong evaluation practices. All resources, including the Python API, source code, user-friendly web interface, and demonstration video, are available at: https://eliyahabba.github.io/PromptSuite/.
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (3 more...)
Position: We Need Responsible, Application-Driven (RAD) AI Research
Hartman, Sarah, Ong, Cheng Soon, Powles, Julia, Kuhnert, Petra
This position paper argues that achieving meaningful scientific and societal advances with artificial intelligence (AI) requires a responsible, application-driven approach (RAD) to AI research. As AI is increasingly integrated into society, AI researchers must engage with the specific contexts where AI is being applied. This includes being responsive to ethical and legal considerations, technical and societal constraints, and public discourse. We present the case for RAD-AI to drive research through a three-staged approach: (1) building transdisciplinary teams and people-centred studies; (2) addressing context-specific methods, ethical commitments, assumptions, and metrics; and (3) testing and sustaining efficacy through staged testbeds and a community of practice. We present a vision for the future of application-driven AI research to unlock new value through technically feasible methods that are adaptive to the contextual needs and values of the communities they ultimately serve.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Oceania > Australia > Western Australia (0.04)
- Oceania > Australia > Northern Territory > Darwin (0.04)
- (6 more...)
- Law (1.00)
- Health & Medicine (1.00)
- Government (1.00)
- Food & Agriculture > Agriculture (0.93)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
- Information Technology > Artificial Intelligence > Applied AI (0.93)
Auto Review: Second Stage Error Detection for Highly Accurate Information Extraction from Phone Conversations
Qamar, Ayesha, Raghuvanshi, Arushi, Sathi, Conal, Son, Youngseo
Automating benefit verification phone calls saves time in healthcare and helps patients receive treatment faster. It is critical to obtain highly accurate information in these phone calls, as it can affect a patient's healthcare journey. Given the noise in phone call transcripts, we have a two-stage system that involves a post-call review phase for potentially noisy fields, where human reviewers manually verify the extracted data$\unicode{x2013}$a labor-intensive task. To automate this stage, we introduce Auto Review, which significantly reduces manual effort while maintaining a high bar for accuracy. This system, being highly reliant on call transcripts, suffers a performance bottleneck due to automatic speech recognition (ASR) issues. This problem is further exacerbated by the use of domain-specific jargon in the calls. In this work, we propose a second-stage postprocessing pipeline for accurate information extraction. We improve accuracy by using multiple ASR alternatives and a pseudo-labeling approach that does not require manually corrected transcripts. Experiments with general-purpose large language models and feature-based model pipelines demonstrate substantial improvements in the quality of corrected call transcripts, thereby enhancing the efficiency of Auto Review.
- North America > United States > North Dakota > Bowman County (0.15)
- Asia > Thailand > Bangkok > Bangkok (0.05)
- Europe > United Kingdom > North Sea > Southern North Sea (0.04)
- (4 more...)
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Kim, Sungnyun, Cho, Sungwoo, Bae, Sangmin, Jang, Kangwook, Yun, Se-Young
Audio-visual speech recognition (AVSR) incorporates auditory and visual modalities to improve recognition accuracy, particularly in noisy environments where audio-only speech systems are insufficient. While previous research has largely addressed audio disruptions, few studies have dealt with visual corruptions, e.g., lip occlusions or blurred videos, which are also detrimental. To address this real-world challenge, we propose CAV2vec, a novel self-supervised speech representation learning framework particularly designed to handle audio-visual joint corruption. CAV2vec employs a self-distillation approach with a corrupted prediction task, where the student model learns to predict clean targets, generated by the teacher model, with corrupted input frames. Specifically, we suggest a unimodal multi-task learning, which distills cross-modal knowledge and aligns the corrupted modalities, by predicting clean audio targets with corrupted videos, and clean video targets with corrupted audios. This strategy mitigates the dispersion in the representation space caused by corrupted modalities, leading to more reliable and robust audio-visual fusion. Our experiments on robust AVSR benchmarks demonstrate that the corrupted representation learning method significantly enhances recognition accuracy across generalized environments involving various types of corruption. Our code is available at https://github.com/sungnyun/cav2vec.
- Europe > Switzerland > Zürich > Zürich (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Texas > Tom Green County (0.04)
- (3 more...)
AAAI News
The AAAI Press - Distributed by The MIT Press Massachusetts Institute of Technology, 5 Cambridge Center, Cambridge, Massachusetts 02142 To order, call toll free: (800) 356-0343 or (617) 625-8569. SPRING 2002 5 first time that AAAI's National conference has been held in Canada--a In addition, the program chairs are experimenting with a new format for AAAI.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.34)
- North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- (17 more...)
- Education (1.00)
- Leisure & Entertainment (0.68)